8 research outputs found

    Missing Data Imputation Using Machine Learning and Natural Language Processing for Clinical Diagnostic Codes

    Get PDF
    Imputation of missing data is a common application in supervised classification problems, where the feature matrix of the training dataset has various degrees of missingness. Most of the former studies do not take into account the presence of the class label in the classification problem with missing data. A widely used solution to this problem is missing data imputation based on the lazy learning technique, k-Nearest Neighbor (KNN) approach. We work on a variant of this imputation algorithm using Gray's distance and Mutual Information (MI), called Class-weighted Gray's kk-Nearest Neighbor (CGKNN) approach. Gray's distance works well with heterogeneous mixed-type data with missing instances, and we weigh distance with mutual information (MI), a measure of feature relevance, between the features and the class label. This method performs better than traditional methods for classification problems with mixed data, as shown in simulations and applications on University of California, Irvine (UCI) Machine Learning datasets (http://archive.ics.uci.edu/ml/index.php). Data being lost to follow up is a common problem in longitudinal data, especially if it involves multiple visits over a long period of time. If the outcome of interest is present in each time point, despite missing covariates due to follow-up (like outcome ascertained through phone calls), then random forest imputation would be a good imputation technique for the missing covariates. The missingness of the data involves more complicated interactions over time since most of the covariates and the outcome have repeated measurements over time. Random forests are a good non-parametric learning technique which captures complex interactions between mixed type data. We propose a proximity imputation and missForest type covariate imputation with random splits while building the forest. The performance of the imputation techniques used is compared to existing techniques in various simulation settings. The Atherosclerosis Risk in Communities (ARIC) Study Cohort is a longitudinal study which started in 1987-1989 to collect data on participants across 4 states in the USA, aimed at studying the factors behind heart diseases. We consider patients at the 5th visit (occurred in 2013) and enrolled in continuous Medicare Fee-For-Service (FFS) insurance in the last 6 months prior to their visit so that their hospitalization diagnostic (ICD) codes are available. Our aim is to characterize the hospitalization of patients having cognitive status ascertainment (classified into dementia, mild cognitive disorder or no cognitive disorder) in the 5th visit. Diagnostic codes for inpatient and outpatient visits identified from CMS (Centers for Medicare \& Medicaid Services) Medicare FFS data linked with ARIC participant data are stored in the form of International Classification of Diseases and related health problems (ICD) codes. We treat these codes as a bag-of-words model to apply text mining techniques and get meaningful cluster of ICD codes.Doctor of Philosoph

    A Computational Framework for Multivariate Convex Regression and Its Variants

    Get PDF
    We study the nonparametric least squares estimator (LSE) of a multivariate convex regression function. The LSE, given as the solution to a quadratic program with O(n²) linear constraints (n being the sample size), is difficult to compute for large problems. Exploiting problem specific structure, we propose a scalable algorithmic framework based on the augmented Lagrangian method to compute the LSE. We develop a novel approach to obtain smooth convex approximations to the fitted (piecewise affine) convex LSE and provide formal bounds on the quality of approximation. When the number of samples is not too large compared to the dimension of the predictor, we propose a regularization scheme—Lipschitz convex regression—where we constrain the norm of the subgradients, and study the rates of convergence of the obtained LSE. Our algorithmic framework is simple and flexible and can be easily adapted to handle variants: estimation of a nondecreasing/nonincreasing convex/concave (with or without a Lipschitz bound) function. We perform numerical studies illustrating the scalability of the proposed algorithm—on some instances our proposal leads to more than a 10,000-fold improvement in runtime when compared to off-the-shelf interior point solvers for problems with n = 500. Keywords: Augmented Lagrangian method; Lipschitz convex regression; Non parametric least squares estimator; Scalable quadratic programming; Smooth convex regressionUnited States. Office of Naval Research (Grant N00014-15-1-2342

    Bempegaldesleukin Plus Nivolumab in First-Line Renal Cell Carcinoma: Results From the PIVOT-02 Study

    No full text
    BACKGROUND: Immune checkpoint inhibitor-based combinations have expanded the treatment options for patients with renal cell carcinoma (RCC); however, tolerability remains challenging. The aim of this study was to evaluate the safety and efficacy of the immunostimulatory interleukin-2 cytokine prodrug bempegaldesleukin (BEMPEG) plus nivolumab (NIVO) as first-line therapy in patients with advanced clear-cell RCC. METHODS: This was an open-label multicohort, multicenter, single-arm phase 1/2 study; here, we report results from the phase 1/2 first-line RCC cohort (N=49). Patients received BEMPEG 0.006 mg/kg plus NIVO 360 mg intravenously every 3 weeks. The primary objectives were safety and objective response rate (ORR; patients with measurable disease at baseline and at least one postbaseline tumor response assessment). Secondary objectives included overall survival (OS) and progression-free survival (PFS). Exploratory biomarker analyses: association between baseline biomarkers and ORR. RESULTS: At a median follow-up of 32.7 months, the ORR was 34.7% (17/49 patients); 3/49 patients (6.1%) had a complete response. Of the 17 patients with response, 14 remained in response for \u3e6 months, and 6 remained in response for \u3e24 months. Median PFS was 7.7 months (95% CI 3.8 to 13.9), and median OS was not reached (95% CI 37.3 to not reached). Ninety-eight per cent (48/49) of patients experienced ≥1 treatment-related adverse event (TRAE) and 38.8% (19/49) had grade 3/4 TRAEs, most commonly syncope (8.2%; 4/49) and increased lipase (6.1%; 3/49). No association between exploratory biomarkers and ORR was observed. Limitations include the small sample size and single-arm design. CONCLUSIONS: BEMPEG plus NIVO showed preliminary antitumor activity as first-line therapy in patients with advanced clear-cell RCC and was well tolerated. These findings warrant further investigation

    Bempegaldesleukin Plus Nivolumab in First-Line Metastatic Urothelial Carcinoma: Results From PIVOT-02

    No full text
    BACKGROUND: Despite recent changes in the treatment landscape, there remains an unmet need for effective, tolerable, chemotherapy-free treatments for patients with advanced/metastatic urothelial carcinoma (mUC), especially cisplatin-ineligible patients. OBJECTIVE: To evaluate the immunostimulatory interleukin-2 cytokine prodrug bempegaldesleukin (BEMPEG) plus nivolumab in patients with advanced/mUC from the phase 2 multicenter PIVOT-02 study. DESIGN, SETTING, AND PARTICIPANTS: This open-label, multicohort phase 1/2 study enrolled patients with previously untreated locally advanced/surgically unresectable or mUC (N = 41). INTERVENTION: Patients received BEMPEG 0.006 mg/kg plus nivolumab 360 mg intravenously every 3 wk. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS: The primary objectives were safety and the objective response rate (ORR) in patients with measurable disease at baseline and at least one postbaseline tumor response assessment (response-evaluable). Secondary objectives were overall survival (OS) and progression-free survival (PFS). Exploratory biomarker analyses via univariate logistic regression were performed to test the association between potential biomarkers (CD8 tumor-infiltrating lymphocytes, tumor mutational burden, and IFN-Îł gene expression profile) and response. RESULTS AND LIMITATIONS: The ORR was 35% (13/37 evaluable patients) and the complete response rate was 19% (7/37 patients); the median duration of response was not reached. Median PFS was 4.1 mo (95% confidence interval [CI] 2.1-8.7) and median OS was 23.7 mo (95% CI 15.8-not reached). Overall, 40/41 patients (98%) experienced at least one treatment-related adverse event (TRAE); grade 3/4 TRAEs occurred in 11 patients (27%), most commonly pyrexia (4.9%; 2 patients). Exploratory biomarker analyses showed no association between biomarkers and response. Limitations include the small sample size and single-arm design. CONCLUSIONS: BEMPEG plus nivolumab was well tolerated and showed antitumor activity as first-line treatment in patients with locally advanced/mUC. PATIENT SUMMARY: We investigated an immune-stimulating prodrug called bempegaldesleukin plus the antibody nivolumab as the first therapy for patients with advanced or metastatic cancer of the urinary tract. This combination had manageable treatment-related side effects and was effective in a subset of patients. This trial is registered at ClinicalTrials.gov as NCT02983045
    corecore